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Ability  and  Military  Tasks 


Abstract 

Strength  influences  the  performance  of  military  physical  tasks.  These  influences  can  be 
summarized  by  models  that  treat  strength  as  a  general  dimension  that  affects  performance  on 
tasks  in  general.  Previous  findings  also  indicate  that  combining  a  general  strength  (GS) 
dimension  with  an  aerobic  capacity  (AC)  dimension  yields  a  model  that  accounts  for  the  full 
pattern  of  association  between  physical  ability  tests  and  lifting  and  carrying.  This  study 
attempted  to  replicate  the  earlier  findings  using  a  strength  test  battery  with  some  new  strength 
measures,  a  different  set  of  military  tasks,  and  a  different  military  population.  Structural  equation 
models  were  constructed  to  represent  strength  as  a  single  construct,  a  two-dimensional  construct 
based  on  measurement  modality,  and  a  seven-dimensional  model  based  on  specific  functional 
movements.  Performance  was  represented  as  a  single  general  performance  dimension  that  added 
digging  and  casualty  evacuation  to  the  manual  materials-handling  tasks  that  had  been  previously 
studied.  A  modified  unidimensional  model  maximized  the  prediction  of  performance.  Adding 
AC  to  the  strength  model  improved  performance  prediction,  but  adding  muscle  endurance  (ME) 
and  anaerobic  power  (AP)  did  not.  The  results  provided  a  very  close  replication  of  earlier 
findings  while  extending  the  model  to  a  wider  range  of  military  tasks  and  a  new  population. 
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Strength  influences  the  performance  of  many  physical  tasks.  The  strength-performance 
association  can  be  modeled  by  representing  strength  as  a  single  general  factor  and  performance 
as  a  general  factor  (Vickers,  1995,  1996,  2003a;  Vickers,  Hodgdon,  &  Beckett,  2009).  Such 
models  have  produced  strength-performance  correlations  ranging  from  r  =  .32  to  r  =  .96.  Typical 
values  fall  in  the  middle  of  this  range,  but  the  model  has  explained  the  overall  pattern  of 
association  of  specific  strength  tests  with  the  performance  of  specific  tasks  in  every  instance.  The 
strong  correlations  between  the  general  factors  combined  with  the  lack  of  substantial  residual 
associations  indicate  that  a  general  strength  (GS)  dimension  adequately  summarizes  the  results  of 
strength  testing  when  predicting  performance.  Specific  strength  measures  do  not  have  to  be 
matched  to  specific  tasks. 

The  apparent  importance  of  general  muscular  strength  for  task  performance  may  be 
misleading.  Two  studies  have  shown  that  the  association  of  GS  with  task  performance  is  weaker 
when  other  physical  abilities  are  considered  (Vickers,  2003a;  Vickers  et  ah,  2009).  The 
association  of  strength  with  performance  may  be  inflated  because  some  of  the  effects  of  other 
causal  variables  are  attributed  to  GS.  This  inflation  would  be  an  example  of  omitted  variable  bias 
(James,  Mulaik,  &  Brett,  1982).  The  most  recent  effort  indicated  that  the  combination  of  GS  and 
aerobic  capacity  (AC)  was  sufficient  to  represent  a  range  of  more  complex  models  that  could 
have  been  generated  by  adding  muscle  endurance  (ME)  and/or  anaerobic  power  (AP)  to  the 
model.  If  this  finding  can  be  replicated,  the  potential  for  omitted  variable  bias  would  be 
substantially  reduced  since  ME  and  AP  could  be  eliminated  as  potential  sources  of  bias. 

The  tasks  examined  also  limit  the  inferences  that  can  be  drawn  from  previous  studies. 
Nearly  all  of  the  tasks  have  been  manual  materials-handling  tasks  involving  lifting,  carrying, 
pushing,  pulling,  and  so  forth.  While  such  tasks  capture  much  of  the  physical  variation  in 
physical  demands  of  military  jobs  (e.g.,  Beckett  &  Hodgdon,  1987;  Rayson,  Holliman,  & 
Belyavin,  2000;  Robertson  &  Trent,  1982,  1985,  Singh  et  ah,  1991),  they  may  not  represent  the 
full  range  of  tasks  that  are  important  in  the  military.  Critical  task  analyses  often  lead  to  the 
conclusion  that  other  tasks  such  as  digging  and  load  carriage  are  essential  for  military 
effectiveness  (Rayson  et  ah,  2000;  Singh  et  ah,  1991). 

This  report  extends  the  previous  modeling  of  physical  abilities  and  performance  in  four 
directions.  Eirst,  the  effects  of  the  strength  measurement  modality  are  examined.  One  previous 
study  of  modality  effects  produced  ambiguous  results  because  isokinetic,  isometric,  and 
isoinertial  strength  measures  all  predicted  task  performance  and  none  were  clearly  superior  to  the 
others  (Dempsey,  Ayoub,  &  Westfall,  1998).  Second,  a  model  based  on  GS  is  compared  with  a 
model  with  a  number  of  more  specific  strength  dimensions  (e.g.,  handgrip).  This  topic  is 
explored  because  it  is  reasonable  to  think  that  accurate  task  prediction  might  require  matching 
tests  to  tasks.  Third,  the  problem  of  omitted  variable  bias  is  explored  further  with  new  measures 
of  ME.  Eourth,  coverage  of  the  task  domain  has  been  increased.  Most  of  the  earlier  work  has 
been  done  with  tasks  that  require  a  brief  maximal  effort  (Vickers,  1995,  1996).  Subsequent 
models  have  introduced  longer-lasting  tasks,  but  these  tasks  have  been  mixed  with  short  maximal 
performance  tasks  (Vickers,  2003a;  Vickers  et  ah,  2009).  The  task  set  in  this  study  emphasizes 
efforts  that  extend  over  several  minutes.  The  study  goal  was  to  replicate  the  previous  findings,  if 
replication  was  possible,  with  these  several  extensions  that  substantially  broadened  the  scope  of 
the  area  of  inquiry. 
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Methods 


Sample 


The  sample  eonsisted  of  116  male  soldiers.  Their  average  age  was  25.5  years  {SD  =  5.8). 
Their  average  weight  was  79.4  kg  {SD  =  10.6).  Their  average  height  was  177.4  mm  {SD  =  7.6). 
(see  Singh  et  ah,  991,  p.  242).  Only  88  of  these  partieipants  eompleted  the  field  test  battery,  so 
the  sample  size  was  set  at  88  for  the  present  struetural  equation  model  (SEM)  analyses. 

Physical  Ability  Tests 

Aerobic  Capacity.  The  test  protoeol  for  assessing  AC  required  subjeets  to  walk  on  a 
treadmill  at  88.9  m/min  while  earrying  a  24.5-kg  fighting  order  load.  The  test  began  with  a  2-min 
warm-up  at  0%  ineline.  Following  the  warm-up,  the  ineline  of  the  treadmill  was  inereased  2% 
every  3  min  until  anaerobie  threshold  was  reaehed.  The  ineline  then  was  inereased  2%  every 
minute  until  maximal  oxygen  uptake  was  reaehed.  No  speeifie  eriteria  for  determining  that  a 
valid  maximum  had  been  aehieved  were  reported,  and  there  was  no  indieation  of  whether  the 
measurement  relied  on  a  plateau  in  oxygen  uptake  or  was  a  peak  value. 

Anaerobic  Power.  AP  measurements  eonsisted  of  30  s  of  “supramaximal”  bouts  on 
modified  Monark  ergometers.  The  modifieations  provided  an  interfaee  with  a  eomputer  system 
that  ealeulated  and  provided  resistanee  loads  based  on  the  subjeet’s  body  weight.  The  exereise 
bout  eonsisted  of  a  warm-up  period  and  the  test  period.  Three  seeonds  after  warm-up,  subjeets 
were  instrueted  to  inerease  their  pedaling  speed  to  maximum.  Load  was  applied  at  maximum  and 
subjeet  worked  for  30  s  with  eneouragement  given  in  the  last  5  s.  A  eomputer  reeorded  the  test 
resistanee,  revolutions  per  minute  (RPM),  peak  power  output,  mean  power  output,  power  deeline 
(i.e.,  the  pereentage  deerease  from  the  peak  power  output  to  the  end  of  the  bout),  and  total  work 
during  the  bout.  Separate  AP  tests  were  performed  for  the  legs  and  the  arms. 

Isometric  Strength  Tests.  Strength  measurements  were  made  with  a  system  that  required 
the  test  subjeet  to  exert  foree  on  a  bar  or  handle  that  was  attaehed  by  a  eable  to  a  load  eell. 

Handgrip.  A  handgrip  dynamometer  was  used  to  determine  the  maximum  grip  strength 
of  eaeh  hand. 

Arm  flexion.  The  subjeet  grasped  the  bar  at  shoulder  width  with  the  arm  at  a  105°  angle. 

Trunk  flexion.  A  shoulder  harness  eonneeted  to  a  load  eell  via  two  pulleys  was  used.  The 
test  was  exeeuted  at  a  hip  angle  of  160°.  The  ehain  that  eonneeted  the  apparatus  to  the 
load  eell  was  at  the  test  subjeet’s  baek. 

Trunk  extension.  Partieipants  stood  on  a  platform  with  the  lateral  borders  of  the  feet  at 
shoulder  width.  The  individual  assumed  a  lifting  position  holding  the  bar  with  an  over- 
and-under  grip.  The  subjeet’s  arms  were  fully  extended,  and  his  hips  were  flexed  at  a 
160°  angle.  Maintaining  that  flexion,  the  subjeet  then  pulled  up  using  the  museles  in  his 
baek  while  keeping  his  baek  straight. 
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Isokinetic-Concentric  Strength  Tests.  For  isokinetic  tests,  a  motor  was  attached  to  the 
cable  to  control  the  rate  of  movement  of  the  bar.  A  goniometer  was  used  to  control  the  angles  for 
the  range  of  motion. 

Arm  flexion.  Subject  grasped  the  bar  at  shoulder  width  with  arms  straight  (180°)  at  the 
beginning  of  the  test.  Subject  performed  a  concentric  contraction  bringing  the  arms  to  a 
40°  angle  with  an  angular  velocity  of  30°/s.  Thus,  a  contraction  lasted  4.67  s. 

Leg  extension.  Subject  began  in  a  standing  position  with  feet  shoulder-width  apart.  He 
then  flexed  his  legs  to  a  90°  angle  and  grasped  a  handle  that  was  adjusted  to  his  height. 
The  subject  then  stood  up,  extending  the  knees  to  a  180°  angle,  and  generating  as  much 
force  as  possible  during  the  process.  The  movement  was  30°/s,  so  exertion  lasted  3  s. 

Trapezius  lift.  Subject  stood  with  feet  at  shoulder  width  grasping  handles  that  were  38.5 
cm  apart  to  mimic  the  grip  used  in  lifting  ammunition  boxes.  At  the  start  of  the  motion, 
the  arms  were  fully  extended.  Subject  then  raised  his  arms  to  clavicle  level  at  a  rate  of 
30°/s.  The  duration  of  the  contraction  varied  from  subject  to  subject  because  of  height 
differences. 

Bench  press.  Subject  was  supine  on  a  bench.  The  bar  connected  to  the  load  cell  was  set  at 
1  inch  above  his  sternum.  Subject  extended  his  arms  to  their  full  extension  at  a  rate  of 
30°/s. 

Trunk  extension.  Subject  began  with  body  flexed  at  the  hips  to  150°.  Keeping  his  back 
and  legs  straight,  the  subject  then  straightened  up  to  170°.  The  test  was  performed  at  an 
angular  velocity  of  15°/s,  so  extension  took  1.3  s. 

Trunk  flexion.  Subject  began  with  body  slightly  flexed  at  the  hips  to  form  an  angle  of 
170°.  Subject  then  bent  forward  keeping  back  and  legs  straight  until  the  angle  diminished 
to  150°.  The  test  was  performance  at  an  angular  velocity  of  15°/s,  so  extension  took  1.3  s. 

Knee  extension.  The  knee  extension  test  was  performed  on  a  Cybex  dynamometer  (Cybex 
International,  Inc.,  Medway,  MA).  The  subject  was  seated  with  his  knee  at  a  90°  angle. 

He  then  straightened  his  leg  to  a  180°  angle.  The  movement  was  at  an  angular  velocity  of 
180°/s.  The  knee  extension  test  was  performed  separately  for  the  right  and  left  legs. 

Knee  flexion.  The  knee  flexion  test  was  performed  on  a  Cybex  dynamometer.  The  subject 
was  seated  with  his  knee  at  a  180°  angle.  He  then  flexed  his  leg  to  a  90°  angle.  The 
movement  was  at  an  angular  velocity  of  180°/s.  The  knee  flexion  test  was  performed 
separately  for  the  right  and  left  legs. 

Muscle  Endurance.  The  four  ME  tests  assessed  the  ability  to  sustain  submaximal 
muscular  exertion.  For  three  of  the  tests,  the  exertion  was  continuous.  Performance  was  the 
length  of  time  that  the  required  exertion  could  be  sustained.  For  the  fourth  test,  the  exertion  was 
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intermittent  and  was  repeated  until  the  test  subjeet  was  unable  to  continue.  The  specific  tests 
were: 


Handgrip.  Using  the  handgrip  dynamometer,  subjects  were  directed  to  generate  a  grip  of 
21  kg  and  to  maintain  it  as  long  as  possible.  The  target  force  was  chosen  to  equal  the 
weight  of  a  full  jerry  can.  Subjects  were  given  encouragement  during  the  test.  The  test 
was  stopped  when  the  force  of  the  muscle  contraction  dropped  below  the  targeted  value 
and  the  participant  was  no  longer  able  to  return  to  the  required  value  within  2  s.  Separate 
tests  were  performed  for  the  right  and  left  hands. 

Isometric  arm  flexion.  Subjects  held  a  free-weight  bar  weighing  20  kg  with  their  arms  at  a 
105°  angle  as  determined  by  a  goniometer.  The  test  stopped  when  the  arm  angle  could  no 
longer  be  maintained  even  with  encouragement. 

Trapezius  lift.  The  subject  stood  erect  with  his  feet  shoulder-width  apart.  He  held  a  20.9- 
kg  load  with  his  arms  at  his  sides  and  fully  extended.  He  then  lifted  the  20.9-kg  load  to 
clavicle  height  at  a  rate  of  6  times  per  minute.  The  lifting  rate  was  controlled  by  a 
metronome.  The  lifts  continued  until  the  subject  was  unable  to  maintain  the  required  pace 
or  had  performed  100  lifts. 

Field  Performance  Test  Battery 

The  field  performance  test  battery  was  constructed  to  represent  a  range  of  critical  tasks 
that  might  be  performed  in  combat  (Singh  et  ah,  1991).  Experienced  military  personnel  identified 
the  critical  tasks.  The  specific  tasks  employed  by  Singh  et  al.  (1991)  were: 

Digging  Slit  Trenches.  A  metal  box  with  dimensions  1.8  m  long  x  0.6  m  wide  x  0.45  m 
deep  was  filled  with  standardized  gravel  <1  cm  diameter  to  a  total  volume  of  0.5  m  . 
Participants  were  instructed  to  shovel  the  gravel  at  the  maximum  rate  possible.  The  task 
ended  when  all  of  the  gravel  had  been  removed.  The  instructions  to  participants  suggest 
that  this  included  some  sweeping  up  of  the  last  remains  by  hand  until  it  was  no  longer 
possible  to  pick  up  a  handful  of  gravel.  The  test  score  was  the  time  in  seconds. 

Casualty  Evacuation.  Subject  was  required  to  evacuate  another  individual  of 
approximately  the  same  height  and  weight  over  a  distance  of  100  m.  The  fireman’s  carry 
was  used  to  transport  the  casualty.  During  the  test,  the  test  subject  wore  a  uniform  and 
carried  a  weapon  and  webbing.  This  test  was  performed  with  maximal  effort.  The  test 
score  was  the  time  in  seconds. 

Jerry  Can  Carry.  Subject  picked  up  a  21 -kg  jerry  can  of  water,  carried  it  35  m,  emptied  it 
into  a  funnel  at  the  height  of  a  truck  bed  (1.3  m),  and  ran  back  to  the  starting  line.  He  then 
picked  up  another  can  and  repeated  the  process.  A  trial  consisted  of  3  roundtrips  to  and 
from  the  starting  line.  The  time  for  the  event  was  recorded  when  the  foot  touched  the 
starting  line  after  the  third  carry.  The  test  score  was  the  time  in  seconds. 
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Handle  Manual  Material.  Ammunition  boxes  weighing  20.9  kg  were  lifted  from  the  floor 
and  plaeed  on  a  flat  surfaee  1.3  m  above  the  floor.  The  distanee  lifted  equaled  the  height 
of  a  lift  to  a  truek  bed.  The  test  eonsisted  of  performing  48  sueh  lifts.  Subjeets  wore  a 
Polar  Sport  Tester  heart  rate  monitor  (Polar  Eleetro  Inc.,  Lake  Success,  NY)  and  were 
instructed  to  perform  the  series  of  lifts  at  a  rate  that  equaled  70%  of  maximal  aerobic 
power.  A  submaximal  work  rate  was  adopted  to  minimize  the  risk  of  injury.  The  test 
score  was  the  time  in  seconds. 

The  original  study  included  a  weight-loaded  march  as  a  fifth  task.  Soldiers  marched  at  a 
pace  of  88.9  m/min  while  carrying  a  24.5-kg  load.  Subjects  continued  at  the  set  speed  for  16  km 
or  until  they  could  no  longer  maintain  the  required  pace.  The  test  score  was  the  distance  covered. 
Because  77.3%  of  all  participants  completed  the  full  16  km,  the  variation  in  performance  was  too 
restricted  to  analyze  the  relationships  to  ability  measures  with  confidence.  This  performance 
measure  was  dropped  from  the  analysis. 

Analysis  Procedures 

The  correlation  matrix  and  descriptive  statistics  for  all  the  measurements  was  extracted 
from  Singh  et  al.  (1991;  most  of  the  correlation  matrix  can  be  found  on  pp.  278-280  of  that 
report).  This  information  was  used  to  construct  the  covariance  matrix  that  was  analyzed. 

Statistics  were  reported  to  two  decimal  places,  so  it  is  possible  that  limited  precision  had  an 
effect  on  the  model  evaluations  presented  later  in  this  report.  Appendix  A  provides  the  relevant 
descriptive  statistics  from  Singh  et  al.  (1991). 

The  LISREL  8.5  program  (Scientific  Software  International,  Chicago,  IE)  developed  by 
Joreskog  and  Sorbom  (1996)  was  employed  to  estimate  structural  models.  Anderson  and 
Gerbing’s  (1988)  two-step  approach  to  modeling  was  adopted.  The  construction  of  measurement 
models  for  ability  and  performance  was  the  first  step.  The  scaling  of  the  latent  traits  in  these 
models  was  established  by  fixing  the  latent  trait  loading  at  1.000  for  one  test  or  task  on  each 
hypothesized  dimension. 

The  estimation  of  relationships  of  physical  abilities  with  task  performance  was  the  second 
analysis  step.  These  analyses  were  carried  out  with  the  parameters  of  the  measurement  models 
fixed  at  the  values  determined  when  developing  the  measurement  models.  Conceptually,  this 
two-step  procedure  reduces  the  ambiguity  of  research  findings  by  ensuring  that  negative  results 
are  not  merely  manifestations  of  poor  measurement  models  (Meehl,  1990).  Also,  this  approach 
reduces  the  risk  that  a  good  measurement  model  will  mask  poor  fit  in  the  substantive  model 
(McDonald  &  Ho,  2002). 

Models  were  evaluated  by  'i  tests  that  compared  the  'i  for  the  model  with  the  'i  for  the 
null  model.  The  root  mean  square  error  of  approximation  (RMSEA;  Browne  &  Cudeck,  1993; 
Steiger,  1990)  and  the  nonnormed  fit  index  (NNEI,  Bentler  &  Bonett,  1980;  Tucker  &  Lewis, 
1973)  were  additional  model  evaluation  criteria  (cf ,  Arbuckle  &  Wothke,  1999,  Appendix  C,  for 
details). 
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Table  1.  Performance  of  Alternative  Strength  Measurement  Models 


Logical 

Min  t 

Max  t 

df 

IL 

RMSEA 

NNFI 

Unidimensional 

4.81 

8.38 

90 

358.86 

.19 

.52 

Modality 

4.91 

8.23 

89 

354.49 

.19 

.52 

Seven^ 

5.06 

12.83 

69 

117.04 

.09 

.81 

Seven “ 

5.39 

12.81 

69 

104.42 

.08 

.82 

Empirical 

Two 

5.49 

10.84 

89 

248.76 

.15 

.62 

Three 

5.42 

10.91 

87 

194.41 

.12 

.72 

Note.  The  minimum  and  maximum  t  values  for  individual  faetor  loadings  indieate  that  all  of  the 
strength  tests  met  the  aceepted  eriterion  for  justifying  that  they  were  acceptable  indicators  of  the 
latent  trait  to  which  they  were  assigned. 

‘‘Leg  extension  was  assigned  to  the  bench  press/lat  pull-down  dimension  in  this  model. 

*’Leg  extension  was  assigned  to  the  trunk  flexion/trunk  extension  dimension  in  this  model. 


Exploratory  factor  analyses  were  conducted  to  develop  alternatives  to  the  conceptual 
strength  measurement  models.  These  analyses  were  conducted  with  SPSS-PC,  version  16  (SPSS, 
Inc.,  Chicago,  IL).  Kaiser’s  (1960)  criterion  was  one  basis  for  deciding  how  many  factors  to 
extract.  O’Connor’s  (2000)  implementation  of  Horn’s  (1965)  parallel  analysis  criterion  was  a 
second  basis  for  deciding  how  many  factors  to  extract.  The  pattern  loadings  from  an  oblimin 
rotation  were  used  to  assign  tests  to  factors  (cf ,  Gorsuch,  1983).  An  oblique  rotation  was  chosen 
because  previous  research  suggested  that  strength  factors  tend  to  correlate  (Vickers,  2003b).  If 
this  were  not  the  case  in  the  present  data,  the  oblimin  rotation  would  produce  factors  with  very 
low  correlations.  The  oblique  rotation  provided  the  opportunity  to  identify  a  model  with 
orthogonal  dimensions,  if  appropriate,  without  assuming  that  the  dimensions  should  be 
orthogonal. 


Results 

Measurement  Modality  Model.  Measurement  modality  had  little  effect  on  strength 
measurement  (Table  1).  This  point  was  established  by  comparing  a  unidimensional  strength 
model  to  a  two-dimensional  model  that  was  labeled  the  measurement  modality  model.  One  latent 
trait  in  the  modality  model  was  defined  by  the  set  of  static  isometric  strength  measures.  The 
other  latent  trait  was  defined  by  the  dynamic  isokinetic  measures. 

Two  findings  suggested  that  the  distinction  between  static  and  dynamic  strength  was 
likely  to  be  unimportant.  First,  the  two  latent  traits  were  very  highly  correlated  (r  =  .93).  Second, 
the  modality  model  did  produce  a  significant  improvement  in  the  accounting  for  the  observed 
strength  test  covariations,  but  the  improvement  was  modest  in  absolute  magnitude  (Ay^  =  4.37,  1 
df,p<. 037). 

Specific  Strength  Model.  A  specific  strength  model  was  constructed  to  assess  the  claim 
that  narrow  strength  factors  will  predict  task  performance  better  than  broad  strength  dimensions. 
The  initial  specific  strength  model  contained  seven  latent  traits  that  were  defined  a  priori: 
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dynamic  and  static  trunk  extension;  dynamie  and  statie  arm  flexion;  beneh  press,  trapezius  lift, 
leg  extension;  dynamie  and  statie  trunk  flexion;  right  and  left  handgrip;  right  and  left  knee 


Table  2.  Strength  Measurement  Models  from  Exploratory  Faetor  Analysis 


Number  of  Faetors  in  Solution 


1 

2 

3 

A 

A  B 

A 

B 

C 

Trunk  extension  (D) 

.817 

.799 

.738 

Left  knee  extension  (D) 

.791 

-.763 

-.734 

Left  knee  flexion  (D) 

.764 

-.877 

-.869 

Right  knee  extension  (D) 

.763 

-.759 

-.724 

Trunk  extension  (S) 

.748 

.658 

.610 

Right  knee  flexion  (D) 

.706 

-.925 

-.895 

Trunk  flexion  (D) 

.674 

.511 

-.726 

Trapezius  lift  (D) 

.654 

.691 

.705 

Trunk  flexion  (S) 

.647 

.491 

-.879 

Arm  flexion  (S) 

.645 

.699 

.666 

Left  handgrip  (S) 

.617 

.398 

.491 

Right  handgrip  (S) 

.603 

-.373 

.419 

-.395 

Beneh  press  (D) 

.593 

.726 

.708 

Leg  extension  (D) 

.558 

.491 

.455 

Arm  flexion  (D) 

.541 

.725 

.700 

Note.  Table  entries  are  the  pattern  loadings  from  an  oblimin  faetor  solution  (of.,  Gorsuoh,  1983). 
Loadings  <.30  (absolute)  have  been  dropped  to  faoilitate  identifioation  of  the  faetor  struoture. 
“D”  indioates  a  dynamie  strength  measure;  “S”  indioates  a  statie  strength  measure. 


flexion;  and  right  and  left  knee  extension.  Note  that  with  one  exoeption,  the  dimensions  in  the 
model  were  defined  either  by  tests  that  involved  dynamie  and  statie  measures  of  a  single  musole 
aotion  or  the  same  aotion  performed  by  the  right  and  left  musole  groups. 

The  initial  model  produoed  a  substantial  improvement  in  goodness  of  fit  relative  to  the 
unidimensional  model  (A'l  =  237.45,  20  df,p  <  .001).  The  modifioation  indioes  from  the  initial 
analysis  of  the  a  priori  model  indioated  that  goodness  of  fit  would  improve  if  leg  extension  were 
aligned  with  the  trunk  flexion  dimension  instead  of  its  original  alignment  with  the  beneh  press 
and  trapezius  lift.  This  shift  improved  on  the  overall  fit  by  5.3%  (Ax^  =  12.62).  The  original 
alignment  of  leg  extension  was  speeulative,  so  leg  extension  was  reassigned  in  the  final  speeifie 
strength  model. 

Exploratory  Factor  Analyses 

Exploratory  faetor  analysis  produoed  three  prinoipal  oomponents  with  ki  >  1.00  (ki  = 
7.44,  ^2  =  1.43,  ^3  =  1.22).  The  first  prinoipal  oomponent  aooounted  for  49.6%  of  the  varianoe; 
the  first  three  prinoipal  oomponents  aooounted  for  67.3%  of  the  varianoe.  Table  2  presents  the 
faetor  struoture  for  models  with  one,  two,  and  three  dimensions. 
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The  SPSS  routine  developed  by  O’Connor  (2000)  provided  a  parallel  analysis  eriterion 
(ef.,  Horn,  1965)  for  the  number  of  factors.  The  95th  percentiles  for  the  first,  second,  and  third 
components  were  1.82,  1.61,  and  1.48,  respectively.  The  50th  percentiles  were  1.67,  1.51,  and 
1.39,  respectively.  By  reasonable  standards,  the  second  and  third  factors  would  be  discounted  as 
chance  findings.  However,  for  the  present  purposes,  the  >  1.00  rule  was  applied  to  explore  the 


Table  3.  Latent  Trait  Correlations:  Seven-Dimensional  Model 
Bench  press/ 


trapezius  lift 

Trunk  extension/ 

1.00 

leg  extension 

0.63 

1.00 

Arm  flexion 

1.04^ 

0.69 

1.00 

Trunk  flexion 

0.56 

0.47 

0.61 

1.00 

Handgrip 

0.58 

0.42 

0.60 

0.38 

1.00 

Knee  flexion 

0.59 

0.46 

0.53 

0.48 

0.55 

Knee  extension 

0.59 

0.55 

0.65 

0.60 

0.57 

1.00 

0.83  1.00 


""Estimated  correlations  sometimes  exceed  1.00  in  structural  models.  This  result  is  attributed  to 
the  sampling  variability  associated  with  a  true  correlation  of  r  ~  1.00. 


maximum  number  of  plausible  factor  structures.  Accordingly,  solutions  with  1  to  3  factors  were 
examined. 

The  relationships  among  the  exploratory  factor  solutions  were  simple  (Table  2).  All 
strength  tests  had  substantial  (>.54)  loadings  in  the  unidimensional  solution.  The  two- 
dimensional  solution  primarily  separated  knee  strength  tests  from  the  general  dimension.  The 
knee  strength  (KS)  dimension  was  strongly  related  (r  =  -.674)  to  the  GS  dimension  defined  by 
the  1 1  remaining  strength  tests. 

The  three-dimensional  structure  further  subdivided  the  original  GS  factor.  A  trunk 
strength  (TS)  factor  defined  by  the  two  trunk  flexion  tests  was  added  to  the  two-dimensional 
model.  The  KS  factor  was  unchanged  from  the  two-dimensional  model.  The  residual  “general” 
factor  was  strongly  correlated  with  knee  strength  (r  =  -.614)  and  moderately  correlated  with 
trunk  flexion  strength  (r  =  -.419).  KS  and  TS  were  moderately  correlated  (r  =  .326). 

When  converted  to  a  structural  equation  model,  the  two-dimensional  model  fit  the  data 
better  than  both  the  unidimensional  and  measurement  modality  models  (Table  1).  The  three- 
dimensional  fit  to  the  data  better  than  any  of  the  simpler  models,  but  the  specific  strength 
dimensions  model  still  remained  the  best  option  by  most  criteria. 

Table  3  gives  the  latent  trait  correlations  for  the  seven-dimensional  model.  One  point  to 
note  is  that  most  of  the  correlations  fell  in  a  rather  narrow  range  (r  =  .38  to  r  =  .65).  The 
exceptions  were  the  correlation  of  the  bench  press  dimension  with  arm  flexion  (r  =  1 .04)  and  the 
correlation  of  the  two  knee  extension  measures  (r  =  .83).  The  correlation  that  exceeds  1.00 
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presumably  represents  a  ease  in  whieh  sampling  variation  makes  a  eorrelation  that  approaehes  r 
=  1.00  apparently  exeeeds  the  upper  limit  for  eorrelations. 

Other  Physical  Abilities 

The  set  of  ability  tests  ineluded  measures  ehosen  to  assess  three  additional  ability 
eonstruets,  ME,  AP,  and  AC.  A  measurement  model  eonstrueted  to  measure  these  three 
dimensions  ineluded  the  4  ME  measures,  the  2  peak  power  measures,  and  V02max.  The  latent 


Table  4.  Comparison  of  Alternative  Models  for  Other  Abilities 


Model 

df 

'I 

RMSEA 

NNFI 

VO2L 

12 

39.43 

.17 

.79 

V02max 

12 

29.15 

.13 

.83 

trait  scaling  was  established  by  assigning  a  loading  of  1.000  to  static  right-hand  endurance,  peak 
power  for  the  leg  ergometer,  and  the  AC  indicator.  The  model  included  only  a  single  AC 
indicator  even  though  two  aerobic  measures  were  available.  Maximal  oxygen  uptake  (VO2)  in 
liters/min  (VO2L)  and  VO2  in  mhkg'^-min'^  (V02max)  were  based  on  a  single  measurement 
procedure.  This  commonality  made  it  appear  wiser  to  just  one  of  the  indicators.  In  each  case,  the 
error  variance  for  this  measure  was  fixed  at  .000,  so  the  aerobic  latent  trait  corresponded  directly 
to  the  measured  variable. 

The  choice  of  an  AC  indicator  affected  the  fit  of  the  measurement  model.  The 
measurement  model  fit  the  data  better  with  V02max  as  the  AC  indicator  than  with  V02l-  AC  was 
significantly  related  to  ME  (r  =  .44)  and  AP  (r  =  .83)  in  the  VO2L  model.  The  V02max  model 
produced  statistically  nonsignificant  correlations  to  ME  (r  =  .10)  and  AP  (r  =  .16).  Despite  the 
apparent  advantage  of  the  V02max  model,  both  models  were  employed  in  parallel  when 
predicting  task  performance.  The  parallel  analyses  made  it  possible  to  evaluate  total  aerobic 
energy  expenditure,  VO2L,  as  a  predictor  of  tasks  that  required  absolute  levels  of  energy 
expenditure  rather  without  adjusting  for  size. 

Performance  Measurement 

A  unidimensional  performance  model  was  constructed  initially.  The  factor  loading  for 
casualty  evacuation  was  fixed  at  1.000  to  set  the  scale  for  the  latent  variable.  The  residual  'i  that 
was  less  than  the  degrees  of  freedom,  so  RMSEA  and  NNFI  indicated  perfect  fit.  The  latent  trait 
loading  for  digging  (k  =  2.56,  t  =  1.67)  did  not  meet  the  standard  for  inclusion  as  indicators  of  a 
general  factor.  The  t  value  for  the  variance  of  the  latent  trait,  t  =  1.82,  also  fell  short  of  the 
recommended  t  >  2.00. 

The  unidimensional  measurement  model  was  retained  even  though  it  did  not  meet 
accepted  modeling  guidelines.  If  there  really  were  no  true  latent  trait  variability,  all  correlations 
of  the  performance  latent  trait  to  ability  latent  traits  should  equal  zero.  The  variation  in  slit  trench 
digging  performance  can  be  divided  conceptually  into  two  parts.  The  first  part  is  variation  arising 
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from  the  general  ability  to  perform  all  of  the  tasks  in  the  study.  The  second  part  is  variation  that 
is  specific  to  slit  trench  digging.  If  this  specific  task  is  related  to  ability  tests  or  traits 
independently  of  the  associations  that  would  be  predicted  from  the  relationship  of  general 
performance  to  physical  ability,  those  associations  would  be  indicated  by  the  residuals  or 
modification  indices  for  the  ability-performance  model.  Any  substantial  task-specific 
associations  then  could  be  added  to  the  model. 


Table  5.  Strength  as  a  Predictor  of  Performance 


Model 

Model 

df 

Change  in  y 

NNFU 

Correlation*’ 

Unidimensional 

Null 

70.61 

60 

.00 

.00 

General  strength 

55.66 

59 

14.95 

1.00" 

-.50 

Measurement  mode 

Null 

72.96 

60 

.00 

.02 

Dynamic  strength 

58.01 

59 

14.95 

1.00" 

-.48 

Static  strength 

55.03 

59 

17.93 

1.00" 

-.50 

Both 

55.21 

58 

17.75 

1.00" 

Empirical  two-dimensional 

Null 

71.72 

60 

.00 

.00 

General  strength 

53.19 

59 

18.53 

1.00" 

-.65 

Knee  strength 

65.45 

59 

6.27 

.44 

-.46 

Both 

52.63 

58 

19.09 

1.00" 

Empirical  three-dimensional 

Null 

68.06 

60 

.00 

.00 

General  strength 

49.41 

59 

18.65 

1.00" 

-.65 

Knee  strength 

62.40 

59 

5.66 

.57 

-.46 

Handgrip  strength 

60.35 

59 

7.71 

.83 

-.49 

All 

48.54 

57 

19.52 

1.00" 

-.64 

Seven-dimensional 

Null 

68.13 

60 

.00 

.00 

Upper  body  strength 

54.17 

59 

13.96 

1.00" 

-.52 

Trunk/leg  extension  strength 

56.78 

59 

11.35 

1.00" 

-.46 

Arm  flexion  strength 

53.56 

59 

14.57 

1.00" 

-.52 

Trunk  flexion  strength 

56.63 

59 

11.50 

1.00" 

-.49 

Handgrip  strength 

60.90 

59 

7.23 

.76 

-.35 

Knee  flexion  strength 

59.54 

59 

8.59 

.93 

-.40 

Knee  extension  strength 

57.67 

59 

10.46 

1.00" 

-.46 

All  seven 

48.89 

53 

19.24 

1.00" 

Upper  body  +  arm  flexion 

53.54 

58 

14.59 

1.00" 

“NNFI  values  >1.00  have  been  reported  as  1.00. 

'’This  is  the  correlation  of  the  strength  latent  trait  with  the  performance  latent  trait. 
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Strength  and  Performance 

The  analysis  of  the  assoeiation  of  physieal  abilities  and  performanee  began  by  exploring 
the  value  of  treating  the  strength  traits  defined  by  the  various  measurement  models  as  eausal 
infiuenees  on  performance  (Table  5).  The  'i  reduction  relative  to  a  model  that  assumed  that 
strength  and  performance  were  independent  was  one  criterion  for  choosing  between  models.  A 
larger  reduction  indicated  a  better  model.  The  correlation  of  the  latent  strength  trait  with  the 
latent  performance  trait  was  a  second  criterion.  A  larger  correlation  indicated  a  better  model. 
Model  parsimony  was  a  third  criterion.  A  model  with  causal  effect  was  preferable  to  a  model 
with  two  causal  effects. 

Models  based  on  the  GS  traits  defined  in  the  exploratory  factor  analyses  were  the  best 
option  (Table  5).  The  models  that  specified  this  latent  trait  as  a  cause  of  performance  differences 
produced  comparable  'i  improvements  whether  the  general  dimension  was  defined  by  the  two- 
dimensional  model  or  the  three-dimensional  model.  The  improvements  approached  the  upper 
limit  for  any  model  in  the  table.  Only  models  that  involved  multiple  latent  traits  produced  greater 
improvements  in  the  fit  of  the  model  to  the  data.  In  each  case,  the  parsimony  principle  supported 
the  adoption  of  the  GS  model.  The  improvements  in  the  model  'i  were  too  small  to  justify 
adding  causal  effects  to  the  model.  The  empirical  GS  models  were  also  the  best  option  by  a  third 
criterion.  The  correlation  of  the  strength  latent  trait  to  performance  was  much  stronger  than  in 
any  other  model  (r  =  .65  vs.  r  <  .55). 

The  evaluation  of  the  seven-dimensional  model  deserves  special  comment.  This  model 
did  not  improve  the  overall  ability  to  account  for  the  associations  of  strength  tests  with 
performance.  The  last  model  fitted  to  the  data  included  causal  effects  of  each  of  the  seven 
dimensions  on  performance.  The  y  for  this  model  (y  =  19.24)  was  only  slightly  larger  than  the 
X  values  for  the  models  with  the  empirical  GS  dimensions  as  predictors  (x  =  18.52  and  x  = 
18.63  for  the  two-  and  three-dimensional  variants,  respectively).  The  addition  of  6  causal 
parameters  to  obtain  a  trivial  'i  improvement  was  unreasonable. 

The  problem  of  choosing  a  model  would  not  be  any  easier  if  the  comparison  had  been 
limited  to  models  specified  within  the  seven-dimensional  framework.  Several  models  defined 
within  this  framework  were  roughly  comparable.  Five  models  produced  'i  improvements 
between  10.46  and  14.57,  with  ability-performance  correlations  between  r  =  -.46  and  r  =  .52. 

The  results  for  those  five  models  were  close  enough  to  suggest  that  they  could  be  considered 
equivalent  models  for  practical  purposes. 

The  GS  dimensions  from  the  empirical  factor  analyses  provided  the  most  reasonable 
representation  of  strength.  The  exclusion  of  knee  strength  measures  is  the  primary  difference 
between  these  latent  traits  and  the  a  priori  unidimensional  model.  That  exclusion  significantly 
improved  the  ability  to  predict  performance. 

Full  Ability  Model 

The  ability  dimensions  of  ME,  AP,  and  AC  were  added  to  the  two-  and  three-dimensional 
empirical  models.  Analyses  with  the  two  models  produce  the  same  patterns  of  results,  but  the 
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three-dimensional  model  provided  better  overall  predietive  aeeuracy  as  would  be  expected  from 
the  analysis  of  strength  dimensions  in  isolation  from  other  abilities.  Therefore,  only  the  results 
for  the  three-dimensional  model  are  considered  here. 

The  models  described  in  Table  6  can  be  compared  using  several  criteria.  How  much  does 
improve  the  fit  of  the  model  relative  to  the  null  model?  The  is  the  index  of  this  criterion. 
How  strong  is  the  association  of  the  ability  trait  with  performance?  The  size  of  the  latent  trait 
correlations  is  the  index  for  this  criterion.  How  well  does  the  model  account  for  the  overall 
association  of  abilities  with  performance?  The  number  of  modification  indices  for  the  excluded 
ability  dimensions  is  the  index  for  this  criterion.  For  example,  when  considering  the  GS  model. 


Table  6.  Performance  Models  for  the  Individual  Dimensions  in  the  Full  Ability  Model 


X 

Model  Evaluation 
Criteria 

Ay^  r  t 

Modification  Indices  for  Excluded  Ability 
Dimensions 

GS  KS  HG  ME  AP  AC 

Null 

GS 

575.13 

548.95 

26.18 

-.53 

-4.38 

.06 

.69 

.85 

4.89 

10.42 

KS 

563.62 

11.51 

-.40 

-3.12 

11.47 

— 

4.38 

4.80 

14.57 

16.35 

HG 

558.09 

17.04 

-.43 

-3.37 

9.77 

2.19 

— 

5.40 

10.80 

13.56 

ME 

563.60 

11.53 

-.46 

-3.46 

8.62 

1.64 

4.39 

— 

10.54 

13.87 

AP 

543.80 

31.33 

-.54 

-4.54 

2.99 

.91 

.01 

.67 

— 

2.72 

AC 

540.55 

34.58 

-.52 

-4.51 

7.69 

1.84 

1.72 

3.12 

2.09 

— 

Note.  GS  =  general  strength;  KS  =  knee  strength;  HG  =  handgrip  strength;  ME  =  muscle 
endurance;  AP  =  anaerobic  power;  AC  =  aerobic  capacity. 


only  2  of  5  modifications  exceed  3.84,  the  minimum  expected  that  would  justify  adding  an 
ability  factor  to  the  existing  model.  The  GS  model  would  be  superior  to  the  KS  model,  which 
leaves  all  5  modification  indices  above  the  critical  value. 

Table  6  also  provides  some  insight  into  the  necessity  of  including  each  ability  in  the  final 
model.  The  index  for  this  determination  is  the  number  of  modification  indices  >3.84  in  the 
column  headed  by  the  ability  dimension.  For  example,  GS  would  be  considered  for  addition  to  4 
of  5  models,  while  there  would  be  no  reason  to  consider  adding  KS  to  any  of  the  other  models  in 
the  table. 

The  ability  trait  evaluation  criteria  split  the  ability  traits  into  two  groups.  The  first  group 
consists  of  the  models  based  on  GS,  AP,  and  AC.  These  models  produced  Ay^  >  25,  accounted 
for  more  than  25%  of  the  performance  variance  (r  <  -.51),  improved  4  of  5  models,  and 
eliminated  at  least  4  of  the  other  latent  traits  from  consideration. 

The  second  group  consisted  of  models  based  on  KS,  HG,  and  ME.  These  models 
produced  smaller  improvements  in  fit  (Ay  <  20),  accounted  for  less  than  22%  of  the  performance 
variance  (r  >  -.46),  and  produced  substantial  improvements  in  the  fit  of  the  model  for  at  most  2 
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of  the  other  5  models.  Finally,  the  models  based  on  these  dimensions  eliminated  no  more  than  2 
of  the  other  latent  traits  from  eonsideration. 

The  evaluation  of  the  single  predietor  models  indieated  that  GS,  AP,  and  AC  are 
suffieiently  similar  to  be  considered  competitive  alternatives.  The  KS,  HG,  and  ME  models  were 
not  competitive  with  these  three  alternatives. 

Closer  examination  of  the  GS,  AP,  and  AC  models  indicated  that  none  of  them  are 
consistently  superior  to  the  others.  The  iS.'i  criterion  ranked  the  models  AC>AP>GS.  The 
strength  of  association  criterion  ranked  the  models  AP>GS>AC.  The  model  improvement 
criterion  indicated  GS=AP=AC.  The  trait  elimination  criterion  ordered  the  models  AP>AC>GS. 
Because  there  was  no  one  dominant  model,  combinations  of  the  GS,  AP,  and  AC  models  were 
examined  to  define  a  final  model  (Table  7). 


Table  7.  Multivariate  Models  for  Performance  Prediction 


Model 

Ay" 

a 

ri 

a 

ri 

Mf 

Null 

GS+AP 

575.13 

542.02 

33.11 

-21 

-1.53 

-.34 

-1.96 

.57 

7.33 

GS+AC 

534.43 

40.70 

-.33 

-2.50 

-.36 

-2.81 

.59 

0.74 

AP+AC 

539.41 

35.72 

-.28 

-1.16 

-.30 

-1.28 

.55 

7.05 

“The  ri  is  the  correlation  of  the  first  (i  =  1)  or  second  (i  =  2)  latent  trait  listed  in  the  model  name, 
’’ti  is  the  t  value  for  the  first  (i  =  1)  or  second  (i  =  2)  ability  latent  trait  listed  in  the  model  name. 
‘^Multiple  correlation  of  the  performance  latent  trait  with  the  two  ability  latent  traits. 
‘^Modification  index  for  the  ability  latent  trait  that  was  omitted  from  the  model. 


There  were  several  reasons  to  prefer  the  GS+AC  model  when  the  three  2-predictor 
models  were  compared.  This  model  produced  the  greatest  improvement  in  fit  relative  to  the  null 
model.  Both  of  the  predictors  were  significantly  related  to  the  performance  criterion;  none  of  the 
associations  were  significant  in  the  other  models.  The  multiple  correlation,  R,  was  stronger  than 
either  of  the  other  two  models.  The  modification  index  for  AP  was  trivial  (MI  =  .74),  whereas 
these  indices  would  have  justified  adding  a  third  predictor  to  either  of  the  other  two  models  (MI 
>  7.00). 

Effect  of  Choice  of  AC  Indicator 

VO2  in  liters  was  the  AC  indicator  in  the  initial  model  evaluations.  Oxygen  uptake  scaled  to 
body  size,  V02max,  is  often  used  as  a  predictor  in  performance  studies,  to  predict  in  studies 
relating  oxygen  uptake  to  performance.  The  three  models  in  Table  7  were  compared  a  second 
time  with  V02max  in  ml/kg/min  as  the  measure  of  AC  (Table  8).  In  this  comparison  as  in  the 
previous  one,  the  GS+AC  model  was  the  best  choice  by  all  criteria. 
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Table  8.  Model  Comparisons  With  VQ2max  as  the  AC  Indicator 


Ax" 

a 

r\ 

a 

ri 

R 

Modification  Indices 
GS  AP  AC 

Null 

571.71 

GS 

544.84 

26.97 

-.53 

-4.35 

.53 

— 

3.21 

15.07 

AP 

544.31 

27.40 

-.53 

-4.22 

.53 

3.55 

— 

8.35 

AC 

551.95 

19.76 

-.44 

-3.64 

.44 

23.22 

16.18 

— 

GS+AP 

541.24 

30.47 

-.31 

-1.67 

-.29 

-1.53 

.55 

— 

— 

12.82 

GS+AC 

529.01 

42.70 

-.50 

-4.50 

-.42 

-3.86 

.63 

— 

.33 

— 

AP+AC 

533.87 

37.84 

-.44 

-3.65 

-.33 

-.291 

.59 

8.65 

— 

— 

“The  subscript  indicates  the  correlation  is  for  the  first  (1)  or  second  (2)  variable  listed  in  the 
model  name. 


The  GS+AC  model  with  VOimax  as  the  AC  indicator  must  be  compared  with  the  GS+AC 
with  VO2  in  liters  to  determine  the  best  available  model.  In  this  comparison,  the  V02max  model 
performed  better  than  the  VO2  in  liters  model  by  every  comparison  criterion.  The  standardized 
coefficients  were  larger,  the  associated  t  values  were  larger,  the  multiple  correlation  was  larger, 
and  the  residual  modification  index  for  AP  was  smaller.  The  optimum  model  clearly  was 
GS+AC  with  V02max  as  the  AC  indicator. 

Residual  Test-Task  Associations 

The  residual  associations  of  strength  tests  with  task  measures  (4  tasks  x  22  tests)  were 
examined  for  the  GS+AC  (V02max)  model.  The  examination  was  undertaken  to  determine  how 
well  the  general  model  captured  the  overall  association  of  tests  with  tasks.  Four  of  the  88 
standardized  residuals  were  large  enough  to  be  statistically  significant  if  just  one  residual  had 
been  examined  (V02max  -  ammunition  box  lift,  z  =  -2.69;  trapezius  endurance  -  jerry  can  carry,  z 
=  -2.03;  V02max  -  jerry  can  carry,  z  =  -2.68;  leg  peak  power  -  slit  trench  digging,  z  =  2.52).  To 
place  this  result  in  context,  39  of  88  standardized  residuals  were  significant  in  the  null  model. 
Fourteen  of  the  residuals  for  the  null  model  were  greater  in  absolute  value  than  the  maximum 
value  (z  =  2.69)  seen  in  the  final  model.  The  maximum  z-score  for  the  residuals  in  the  null  model 
(z  =  -4.35)  was  much  larger  than  the  maximum  in  the  final  model.  Further  context  is  provided  by 
considering  that  there  is  a  better  than  50:50  chance  of  finding  4  or  more  significant  associations 
by  chance  when  88  tests  are  performed  ip  =  .52).  Finally,  the  Bonferroni  adjustment  to  maintain 
the  overall  error  rate  at /»  =  .05  for  the  set  of  88  residuals  is  />  <  .0006,  which  corresponds  to  z  > 
3.45  for  a  2-tailed  test.  None  of  the  observed  residuals  approached  this  value. 

Discussion 

The  strength  measurement  models  produced  three  distinct  sets  of  models.  One  set 
consisted  of  the  unidimensional  model,  the  isometric  model,  and  the  isokinetic  model.  A  second 
set  consisted  of  models  based  on  five  of  the  dimensions  from  the  specific  functions  model.  The 
third  set  consisted  of  the  9-  and  1 1 -test  GS  dimensions  in  the  empirical  models.  These  sets  were 
defined  by  noting  that  the  models  within  each  set  were  about  equally  effective  in  accounting  for 
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the  covariation  of  test  scores  with  task  scores.  The  models  within  each  set  also  were  about 
equally  effective  in  predicting  performance. 

The  choice  between  the  three  sets  of  equivalent  models  was  straightforward.  The 
empirical  GS  dimensions  models  accounted  for  42%  of  the  performance  variance  (r  =  -.65).  The 
models  in  the  other  two  sets  accounted  for  at  most  27%  of  that  variance  (general  dimensions,  - 
.50  <  r  <-.40;  specific  dimensions,  -.52  <  r  <-.35). 

The  initial  steps  in  constructing  the  final  performance  prediction  model  identified  GS, 

AP,  and  AC  as  the  major  correlates  of  performance.  GS  and  AC  were  the  only  abilities  in  the 
final  model.  This  pairing  replicated  Vickers  et  al.’s  (2009)  findings  for  moderate  duration 
physical  tasks  (Vickers  et  ah,  2009). 

The  study  shed  light  on  the  belief  that  specific  tests  must  be  administered  to  predict  task 
performance  (McArdle,  Katch,  &  Katch,  2001,  p.  597).  Following  this  logic,  ME  and  AP  would 
be  expected  to  be  the  primary  predictors  of  task  performance.  The  rationale  would  be  that  the 
duration  of  the  physical  tasks  in  this  study  fell  in  a  range  that  is  generally  thought  of  as  requiring 
these  abilities.  However,  the  study  by  Vickers  et  al.  (2009)  also  included  ME  and  AP,  and  it  too 
excluded  those  dimensions  from  the  final  model.  These  negative  findings  might  be  countered  by 
arguing  that  test-task  matching  should  be  carried  out  at  the  level  of  individual  tests  and  specific 
tasks.  If  this  argument  were  correct,  substantial  test-task  residuals  would  be  expected.  While 
some  nominally  significant  (p  <  .05)  residuals  were  found  in  this  study,  the  number  was  no 
greater  than  expected  by  chance.  MacCallum,  Roznowski,  &  Necowitz  (1992)  pointed  out  that  a 
substantial  residual  often  is  a  chance  event  that  does  not  replicate.  Experience  has  shown  that 
test-task  residuals  are  not  likely  to  replicate  when  modeling  the  relationships  of  physical  abilities 
to  physical  tasks  (Vickers,  1996;  Vickers  et  ah,  2009).  Still,  the  residuals  in  this  study  may 
replicate  in  future  work.  Elntil  then,  there  is  no  reason  to  invoke  test-task  specificity  as  a  criterion 
for  task  prediction.  One  benefit  of  this  conclusion  is  that  the  ability-performance  model 
constructed  here  can  be  generalized  tentatively  to  moderate  duration  tasks  that  were  not  covered 
in  this  study.  Such  generalization  would  not  be  possible  with  a  test-task  approach.  Every  task 
would  require  its  own  model.  Eor  the  present,  the  important  point  is  that  subjective  judgments 
based  on  test-task  matching  are  likely  to  be  a  poor  guide  to  identifying  the  key  physical  abilities 
for  task  performance. 

The  study  extended  an  earlier  demonstration  of  the  risk  of  omitted  variable  bias  in 
ability-performance  modeling  (Vickers  et  ah,  2009).  Bias  arises  when  a  causal  model  omits  one 
or  more  causal  factors  and  those  omitted  causes  are  correlated  with  predictors  in  the  model. 

When  this  happens,  the  model  will  assign  part  of  the  causal  effects  of  the  omitted  variables  to 
variables  that  are  in  the  model,  thereby  biasing  the  estimates  for  the  included  variables  (James  et 
ah,  1982).  The  final  model  can  be  used  to  illustrate  this  risk.  If  HG,  KE,  ME,  or  AP  had  been 
studied  in  isolation,  the  correlation  of  each  dimension  with  performance  could  have  been 
interpreted  as  a  causal  effect.  The  final  model  implies  that  each  of  these  apparent  causal  models 
would  have  been  the  spurious  product  of  omitted  variable  bias.  Any  training  program  based  on 
those  spurious  models  would  do  little  to  improve  performance.  Because  physical  abilities  often 
display  moderate  to  strong  correlations,  omitted  variable  bias  should  always  be  considered  when 
constructing  causal  models  relating  physical  abilities  to  performance. 
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Several  additional  points  merit  brief  eomment.  First,  measurement  modality  had  little 
effeet  on  strength  assessments.  This  result  replieated  an  earlier  finding  by  Dempsey  et  al.  (1998). 
To  date,  isometrie,  isotonie  or  isoinertial,  and  isokinetic  measures  all  are  reasonable  options  for 
strength  measurement.  Second,  a  hierarchical  strength  measurement  model  might  be  appropriate. 
Vickers’  (2003b)  analysis  of  extensive  batteries  of  tensiometer  measures  led  to  the  same 
conclusion.  The  lowest  level  of  the  hierarchy  would  consist  of  specific  strength  factors.  The 
intermediate  level  would  distinguish  upper  strength  from  lower  body  strength.  GS  would  be  the 
highest  level  in  the  model.  This  hierarchy  could  combine  the  specific  strength  dimension  model 
developed  here  with  dimensions  found  in  the  empirical  two-dimensional  model  and  would  yield 
a  GS  dimension  based  on  the  correlation  between  the  upper  and  lower  body  strength  dimensions. 
Third,  the  fact  that  size-adjusted  maximal  oxygen  uptake,  V02max,  predicted  performance  better 
than  absolute  oxygen  uptake,  VOil,  was  surprising  since  VO2L  would  seem  to  be  the  more 
natural  indicator  of  how  rapidly  the  energy  demands  of  a  fixed  task  could  be  met.  The 
unanticipated  finding  may  be  related  to  recent  observations  that  high  V02max  is  associated  with 
faster  activation  of  aerobic  processes  during  exercise  (Kilding,  Fysh,  &  Winter,  2007;  Kilding, 
Winter,  &  Fish,  2006)  and  with  a  higher  anaerobic  threshold  (Myers  &  Ashley,  1997;  Paterson  & 
Cunningham,  1999).  Thus,  V02max  is  an  index  of  the  ability  to  maintain  a  higher  level  of 
submaximal  energy  production  and  to  reach  that  level  more  rapidly  after  beginning  to  work.  The 
results  obtained  here  may  indicate  that  these  factors  are  more  important  than  the  actual  maximum 
aerobic  energy  that  can  be  generated.  Finally,  a  performance  was  adequately  represented  by  a 
single  latent  trait.  This  result  was  unexpected  since  the  individual  performance  tasks  had  been 
chosen  to  represent  combat  activities  that  were  perceived  to  require  different  abilities.  The 
unidimensional  structure  of  performance  suggests  task  duration  may  be  the  key  to  understanding 
which  physical  abilities  are  required.  All  of  the  tasks  in  this  study  were  of  moderate  duration. 
Prior  work  indicates  that  brief  maximal  materials-handling  tasks  also  define  a  single  performance 
factor  (Vickers,  1995,  1996),  and  that  this  factor  requires  a  different  combination  of  abilities  than 
moderate  duration  tasks  similar  to  those  in  this  study  (Vickers  et  ah,  2009).  These  observations 
lead  to  the  inference  that  systematic  task  sampling  is  a  critical  factor  to  consider  when 
identifying  abilities  that  define  combat  readiness. 

In  summary,  the  replication  of  an  earlier  GS+AC  model  for  moderate  duration  tasks  was 
the  most  important  finding  in  this  study.  This  replication  included  rejecting  ME  and  AP  as 
critical  abilities  for  moderate  duration  tasks.  The  GS  dimension  in  the  present  model  was  a  broad 
construct  defined  by  static  and  dynamic  strength  tests  and  encompassing  both  upper  and  lower 
body  strength  tests.  The  study  reinforced  doubts  about  the  effectiveness  of  test-task  specificity  as 
a  basis  for  causal  inferences  about  the  ability-performance  interface.  The  study  also  reinforced 
concerns  about  omitted  variable  bias  as  a  problem  for  performance  modeling.  The  explanation  of 
why  GS+AC  model  works  well  remains  uncertain,  but  the  fact  that  this  model  no  has  proven  to 
be  the  best  option  in  each  of  two  studies  indicates  that  it  is  a  reliable  framework  for  identifying 
abilities  to  target  in  physical  training  programs  undertaken  to  improve  performance  on  moderate 
duration  military  tasks. 
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Appendix.  Descriptive  Statistics  From  Singh  et  al.  (1991) 


Measure 

M 

SD 

Anaerobic  power  indicators 

Leg  peak  power 

762.8 

113.0 

Arm  peak  power 

444.0 

83.3 

Aerobic  capacity  indicators 

V02max  (L/min) 

52.28 

0.60 

V02max  (ml/kg/min) 

52.28 

6.31 

AT  VO2  (ml/kg/min) 

45.52 

6.16 

Isometric  strength  tests 

Right  handgrip 

55.3 

8.2 

Left  handgrip 

51.9 

8.6 

Arm  flexion 

46.7 

15.8 

Trunk  flexion 

62.9 

10.7 

Trunk  extension 

171.0 

25.1 

Isokinetic  strength  tests 

Right  knee  flexion 

115.4 

19.3 

Left  knee  flexion 

114.1 

22.4 

Right  knee  extension 

157.7 

26.6 

Left  knee  extension 

156.1 

25.5 

Arm  flexion 

112 

20.4 

Trunk  flexion 

73.0 

10.6 

Trunk  extension 

161.7 

24.7 

Leg  extension 

257.9 

62.9 

Trapezius  lift 

62.7 

15.6 

Bench  press 

116.4 

26.4 

Muscle  endurance  tests 

Static  right  handgrip  (s) 

119.9 

52.3 

Static  left  handgrip  (s) 

107.0 

42.8 

Static  arm  flexion  (s) 

109.3 

43.9 

Dynamic  trapezius  lift  (reps) 

92.5 

20.1 

Performance  tasks 

Casualty  evacuation  (s) 

46.82 

8.52 

Ammunition  box  lift 

164.27 

50.62 

Jerry  can  carry 

242.32 

30.10 

Digging  slit  trench 

262.04 

44.54 

Note.  Of  88  subjects  who  completed  the  weight-loaded  march,  6  did  not  reach  50%  of  V02max 
and  only  20  exceeded  70%  of  V02max-  This  test  was  definitely  submaximal,  so  much  so  that  the 
above  information  implies  that  most  study  participants  did  not  reach  their  anaerobic  threshold 
during  the  march. 
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